⚡️ Speed up method `GeoCoordinate._to_dict` by 143% #92

codeflash-ai · 2025-10-23T05:11:04Z

📄 143% (1.43x) speedup for `GeoCoordinate._to_dict` in `weaviate/collections/classes/types.py`

⏱️ Runtime : 1.27 milliseconds → 524 microseconds (best of 166 runs)

📝 Explanation and details

The optimization replaces Pydantic's model_dump(exclude_none=True) method with a direct dictionary comprehension that filters out None values from self.__dict__. This achieves a 142% speedup by eliminating the overhead of Pydantic's serialization machinery.

Key changes:

Direct attribute access: {k: v for k, v in self.__dict__.items() if v is not None} bypasses Pydantic's model dumping process
Simpler filtering: Uses native Python dictionary comprehension instead of Pydantic's exclude_none=True parameter processing

Why this is faster:

Pydantic's model_dump() involves field validation, type conversion, and complex serialization logic that adds ~2.5x overhead
Dictionary comprehension with direct __dict__ access is a lightweight operation that maps directly to the desired output
Since GeoCoordinate has simple float fields (latitude and longitude) with Pydantic Field constraints ensuring they're always present and valid, the direct approach is safe

Performance characteristics:
The optimization shows consistent 130-360% speedups across all test cases, with particularly strong gains on:

High-precision floats (352% faster)
Extended classes with additional fields (315% faster)
Basic coordinate operations (155-310% faster)
Large batch processing (139-142% faster)

This optimization is ideal for high-frequency coordinate processing where the _to_dict() method is called repeatedly, as it eliminates unnecessary Pydantic overhead while maintaining the same output format.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 5555 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

from typing import Dict

# imports
import pytest
# function to test
from pydantic import BaseModel, Field
from weaviate.collections.classes.types import GeoCoordinate


class _WeaviateInput(BaseModel):
    pass
from weaviate.collections.classes.types import GeoCoordinate

# unit tests

# 1. Basic Test Cases

def test_basic_valid_coordinates():
    # Test with valid latitude and longitude
    geo = GeoCoordinate(latitude=45.0, longitude=90.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 5.89μs -> 1.44μs (310% faster)
    # Test with negative values
    geo = GeoCoordinate(latitude=-45.0, longitude=-90.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 1.48μs -> 523ns (182% faster)

def test_basic_zero_coordinates():
    # Test with zero values
    geo = GeoCoordinate(latitude=0.0, longitude=0.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.99μs -> 1.07μs (178% faster)

def test_basic_decimal_coordinates():
    # Test with decimal values
    geo = GeoCoordinate(latitude=12.345, longitude=67.89)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.87μs -> 1.05μs (172% faster)

# 2. Edge Test Cases

def test_edge_latitude_bounds():
    # Test at latitude upper bound
    geo = GeoCoordinate(latitude=90.0, longitude=0.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.69μs -> 1.02μs (163% faster)
    # Test at latitude lower bound
    geo = GeoCoordinate(latitude=-90.0, longitude=0.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 1.18μs -> 462ns (155% faster)

def test_edge_longitude_bounds():
    # Test at longitude upper bound
    geo = GeoCoordinate(latitude=0.0, longitude=180.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 2.59μs -> 940ns (175% faster)
    # Test at longitude lower bound
    geo = GeoCoordinate(latitude=0.0, longitude=-180.0)
    codeflash_output = geo._to_dict(); result = codeflash_output # 1.05μs -> 460ns (129% faster)

def test_edge_invalid_latitude():
    # Latitude too high
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=90.1, longitude=0.0)
    # Latitude too low
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=-90.1, longitude=0.0)

def test_edge_invalid_longitude():
    # Longitude too high
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=180.1)
    # Longitude too low
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=-180.1)




def test_edge_float_precision():
    # Test with high precision floats
    geo = GeoCoordinate(latitude=12.123456789, longitude=34.987654321)
    codeflash_output = geo._to_dict(); result = codeflash_output # 5.92μs -> 1.31μs (352% faster)

# 3. Large Scale Test Cases

def test_large_scale_many_instances():
    # Test creating and dumping many instances
    coordinates = [
        GeoCoordinate(latitude=float(i % 180 - 90), longitude=float(i % 360 - 180))
        for i in range(1000)
    ]
    dicts = [g._to_dict() for g in coordinates]
    for i, d in enumerate(dicts):
        pass

def test_large_scale_performance(monkeypatch):
    # This test checks that _to_dict is not unnecessarily slow for a large batch
    import time
    coordinates = [GeoCoordinate(latitude=0.0, longitude=0.0) for _ in range(500)]
    start = time.time()
    for geo in coordinates:
        geo._to_dict() # 400μs -> 165μs (142% faster)
    elapsed = time.time() - start

def test_large_scale_unique_values():
    # Test that all unique values are preserved in output
    lats = [float(i % 180 - 90) for i in range(500)]
    lons = [float(i % 360 - 180) for i in range(500)]
    coordinates = [GeoCoordinate(latitude=lat, longitude=lon) for lat, lon in zip(lats, lons)]
    dicts = [geo._to_dict() for geo in coordinates]
    for i, d in enumerate(dicts):
        pass

# 4. Determinism Test

def test_determinism_same_input_same_output():
    # The same input should always give the same output
    geo1 = GeoCoordinate(latitude=23.5, longitude=42.1)
    geo2 = GeoCoordinate(latitude=23.5, longitude=42.1)
    codeflash_output = geo1._to_dict() # 5.90μs -> 1.28μs (360% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, Optional

# imports
import pytest  # used for our unit tests
# function to test
from pydantic import BaseModel, Field
from weaviate.collections.classes.types import GeoCoordinate


# Minimal _WeaviateInput for test purposes
class _WeaviateInput(BaseModel):
    pass
from weaviate.collections.classes.types import GeoCoordinate

# unit tests

# ------------------------
# 1. Basic Test Cases
# ------------------------

def test_basic_positive_coordinates():
    # Test with valid positive latitude and longitude
    coord = GeoCoordinate(latitude=45.0, longitude=90.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 3.66μs -> 1.08μs (239% faster)

def test_basic_negative_coordinates():
    # Test with valid negative latitude and longitude
    coord = GeoCoordinate(latitude=-45.0, longitude=-90.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 3.22μs -> 1.08μs (199% faster)

def test_basic_zero_coordinates():
    # Test with zero latitude and longitude
    coord = GeoCoordinate(latitude=0.0, longitude=0.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.94μs -> 1.08μs (173% faster)

def test_basic_float_precision():
    # Test with float values with precision
    coord = GeoCoordinate(latitude=12.345678, longitude=98.7654321)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.79μs -> 1.06μs (165% faster)

# ------------------------
# 2. Edge Test Cases
# ------------------------

def test_latitude_upper_bound():
    # Test with latitude at upper bound
    coord = GeoCoordinate(latitude=90.0, longitude=0.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.78μs -> 1.03μs (170% faster)

def test_latitude_lower_bound():
    # Test with latitude at lower bound
    coord = GeoCoordinate(latitude=-90.0, longitude=0.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.71μs -> 1.02μs (167% faster)

def test_longitude_upper_bound():
    # Test with longitude at upper bound
    coord = GeoCoordinate(latitude=0.0, longitude=180.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.71μs -> 998ns (172% faster)

def test_longitude_lower_bound():
    # Test with longitude at lower bound
    coord = GeoCoordinate(latitude=0.0, longitude=-180.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 2.78μs -> 996ns (180% faster)

def test_latitude_out_of_bounds_high():
    # Test latitude above upper bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=91.0, longitude=0.0)

def test_latitude_out_of_bounds_low():
    # Test latitude below lower bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=-91.0, longitude=0.0)

def test_longitude_out_of_bounds_high():
    # Test longitude above upper bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=181.0)

def test_longitude_out_of_bounds_low():
    # Test longitude below lower bound should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=-181.0)

def test_missing_latitude():
    # Test missing latitude should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(longitude=0.0)

def test_missing_longitude():
    # Test missing longitude should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0)

def test_none_latitude():
    # Test latitude set to None should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=None, longitude=0.0)

def test_none_longitude():
    # Test longitude set to None should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude=None)

def test_non_float_latitude():
    # Test latitude as string should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude='not_a_float', longitude=0.0)

def test_non_float_longitude():
    # Test longitude as string should raise validation error
    with pytest.raises(ValueError):
        GeoCoordinate(latitude=0.0, longitude='not_a_float')

def test_extra_fields_ignored():
    # Test that extra fields are ignored in _to_dict
    class ExtendedGeoCoordinate(GeoCoordinate):
        altitude: float = Field(default=100.0)

    coord = ExtendedGeoCoordinate(latitude=10.0, longitude=20.0, altitude=500.0)
    codeflash_output = coord._to_dict(); result = codeflash_output # 5.22μs -> 1.26μs (315% faster)

# ------------------------
# 3. Large Scale Test Cases
# ------------------------

def test_many_instances():
    # Test creating many instances and dumping them
    coords = [
        GeoCoordinate(latitude=float(i % 181 - 90), longitude=float(i % 361 - 180))
        for i in range(1000)
    ]
    for i, coord in enumerate(coords):
        codeflash_output = coord._to_dict(); result = codeflash_output # 807μs -> 338μs (139% faster)
        expected_lat = float(i % 181 - 90)
        expected_lon = float(i % 361 - 180)

def test_large_float_precision():
    # Test with very large float values within bounds
    coord = GeoCoordinate(latitude=89.999999999, longitude=179.999999999)
    codeflash_output = coord._to_dict(); result = codeflash_output # 3.79μs -> 1.03μs (269% faster)

def test_performance_under_load():
    # Test performance by creating and dumping many objects
    import time
    start = time.time()
    coords = [GeoCoordinate(latitude=0.0, longitude=0.0) for _ in range(1000)]
    dicts = [coord._to_dict() for coord in coords]
    end = time.time()
    # Ensure all dicts are correct
    for d in dicts:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from weaviate.collections.classes.types import GeoCoordinate

def test_GeoCoordinate__to_dict():
    GeoCoordinate._to_dict(GeoCoordinate(latitude=0.0, longitude=0.0))

Timer unit: 1e-09 s

To edit these changes git checkout codeflash/optimize-GeoCoordinate._to_dict-mh2ys2xa and push.

The optimization replaces Pydantic's `model_dump(exclude_none=True)` method with a direct dictionary comprehension that filters out `None` values from `self.__dict__`. This achieves a **142% speedup** by eliminating the overhead of Pydantic's serialization machinery. **Key changes:** - **Direct attribute access**: `{k: v for k, v in self.__dict__.items() if v is not None}` bypasses Pydantic's model dumping process - **Simpler filtering**: Uses native Python dictionary comprehension instead of Pydantic's `exclude_none=True` parameter processing **Why this is faster:** - Pydantic's `model_dump()` involves field validation, type conversion, and complex serialization logic that adds ~2.5x overhead - Dictionary comprehension with direct `__dict__` access is a lightweight operation that maps directly to the desired output - Since `GeoCoordinate` has simple float fields (`latitude` and `longitude`) with Pydantic Field constraints ensuring they're always present and valid, the direct approach is safe **Performance characteristics:** The optimization shows consistent 130-360% speedups across all test cases, with particularly strong gains on: - High-precision floats (352% faster) - Extended classes with additional fields (315% faster) - Basic coordinate operations (155-310% faster) - Large batch processing (139-142% faster) This optimization is ideal for high-frequency coordinate processing where the `_to_dict()` method is called repeatedly, as it eliminates unnecessary Pydantic overhead while maintaining the same output format.

codeflash-ai bot requested a review from mashraf-222 October 23, 2025 05:11

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `GeoCoordinate._to_dict` by 143% #92

⚡️ Speed up method `GeoCoordinate._to_dict` by 143% #92

Uh oh!

codeflash-ai bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method GeoCoordinate._to_dict by 143% #92

Are you sure you want to change the base?

⚡️ Speed up method GeoCoordinate._to_dict by 143% #92

Uh oh!

Conversation

codeflash-ai bot commented Oct 23, 2025

📄 143% (1.43x) speedup for GeoCoordinate._to_dict in weaviate/collections/classes/types.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `GeoCoordinate._to_dict` by 143% #92

⚡️ Speed up method `GeoCoordinate._to_dict` by 143% #92

📄 143% (1.43x) speedup for `GeoCoordinate._to_dict` in `weaviate/collections/classes/types.py`